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Abstract 

We have generated a parametrization of the Compton form factor (CFF) T~i based 
on data from deeply virtual Compton scattering (DVCS) using neural networks. 
This approach offers an essentially model-independent fitting procedure, which pro- 
vides realistic uncertainties. Furthermore, it facilitates propagation of uncertainties 
from experimental data to CFFs. We assumed dominance of the CFF H and used 
HERMES data on DVCS off unpolarized protons. We predict the beam charge-spin 
asymmetry for a proton at the kinematics of the COMPASS II experiment. 

1 Introduction 

Generalized parton distributions (GPDs) provide a detailed description of the nucleon 
in terms of partonic degrees of freedom, which allows, e.g., to address the three-dimensional 
distribution of quarks and gluons and the partonic decomposition of the nucleon spin [HI. 
Their determination improves our understanding of non-perturbative QCD dynamics in 
general. More specifically, they are a necessary (though not sufficient) input for the the- 
oretical description of multiple-hard reactions in proton-proton collisions at LHC collider 
and for other applications reaching beyond the limit of collinear QCD. Information on 
GPDs comes from many sides. There exists already an impressive experimental data base 
for exclusive reactions, which is continuously improved. A standard approach is either con- 
frontation of model predictions || or least-squares fit to this data. The latter approach, 
formulated in Mellin space, is analogous to the well-established global fitting framework of 
parton distribution functions (PDFs). Complementary information on moments of GPDs 
comes from lattice QCD 

However, compared to the situation for PDFs, extracting GPDs from data is a much 
more intricate task and model ambiguities are much larger. For instance, the theoretically 
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cleanest process, used for the determination of GPDs, is deeply virtual Compton scattering 
(DVCS), 7*p — > 7p, which can be parametrized by twelve Compton form factors (CFFs). 
At leading order in the (inverse) photon virtuality squared 1/Q 2 they are expressed as 
convolutions of GPDs with perturbatively calculable coefficient functions. At leading order 
in the QCD coupling constant and for one of the CFFs, H, we have 



n(x B ,t,Q 2 )= I dx 



1 



£ — x — ie t; + x — ie 



H(x,U,Q 2 ), (1) 



where £ = Xb/{2 — Xb), Xb is Bjorken's scaling variable, t is the momentum transfer 
squared, and H is the GPD. Formulae such as suggest to parametrize GPDs at some 
input scale Qq, to employ QCD evolution equations to determine the GPDs at the desired 
scale Q 2 , then to make the momentum fraction convolution to obtain CFFs that are used 



to calculate measured cross-sections and asymmetries via known formulae [IT, [Tj|, and, 
finally, to confront the result with experimental data by means of the least-squares method. 
Due to the facts that GPDs cannot be fully constrained even by ideal data and that they 
depend at the input scale <2q on three variables, the space of possible functions, although 
restricted by GPD constraints, is huge. As a result, the theoretical uncertainty induced 
by the choice of the fitting model is much more serious than in the PDF case, where the 
model functions depend at the input scale only on one variable, namely, the longitudinal 
momentum fraction x. The fact that exclusive data is typically much less precise than 
inclusive one exacerbates the situation further. 

In this paper we explore an alternative approach, in which neural networks are used in 
place of specific models. This essentially eliminates the problem of model dependence and, 
as an additional advantage, facilitates a convenient method to propagate uncertainties from 
experimental measurements to the final result. In the context of nucleon structure studies, 
neural networks have already been successfully applied to extract PDFs by the NNPDF 



group ||l3|HT5|j and to parametrize electromagnetic form factors |16[|. Similarly, they have 
been employed for parametrization of spectral function of hadronic tau decays fll7| . In the 
case of GPDs, because of the abovementioned reasons we expect that the advantages of 
this method should be even more pronounced. 

While our long-term goal are global fits of GPDs, the data presently available allows 
only a less ambitious analysis, namely the fit of the dominant CFF % using neural net- 
worksQ. This reduces the mathematical complexity of the problem (increased also by GPD 
constraints), while still retaining the relevance for the study of hadron structure. 

The imaginary part of the CFF ([!]) is at leading order 

- 3mU{x B = t^-, t, Q 2 ) L =? H(x, £ = x,t, Q 2 ) - H(-x, £ = x,t, Q 2 ) , (2) 

and so, although we consider only extraction of CFFs, JmH provides us with direct infor- 
mation about the shape of GPD H on the cross-over line £ — x. Moreover, a "dispersion 



1 Note that neural network analysis of deeply inelastic scattering data also started by fitting directly 
the structure function F 2 fLq, |i~9| 
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<Ke H{x B , t, Q 2 ) L =? PV / dx [H(x, x, t, Q 2 ) - H(-x, x, t, Q 2 )] - C(t, Q 2 ) , (3) 

Jo q — x 

can then be used as a sum rule to pin down this GPD outside of the kinematically accessible 
region. Extraction of CFFs from DVCS data in a largely model-independent way was also 
performed in P^~P7|. In contrast to the work presented here, these analyses are local in 
the sense that they extract values of CFFs separately at each measured kinematic point. 
We emphasize that in Refs. |2~3)-pE| the systematic uncertainties, induced by other non- 
dominant CFFs, were explored with model-dependent constraints for the four twist-two 
related CFFs. 

The paper is organized as follows. In Section |2| we describe salient features of the 
neural network approach. In Section |] we present fits to a well-defined subset of available 
DVCS data, so that we could study neural network method in a controlled situation, free 
of subtleties which surface in many fits involving data coming from a number of different 
experiments. In particular, we used HERMES measurements of two observables, beam 
spin asymmetry (BSA) and beam charge asymmetry (BCA) in leptoproduction of a real 
proton (of which DVCS is a subprocess) |2S| . This data belongs to a kinematic region where 
these two asymmetries are determined essentially by the imaginary and the real part of the 
Compton form factor % ([I]), respectively, and where dependence of % on Q 2 is weak and, 
therefore, can be neglected for simplicity. The neural network parameterization of CFF % 
is then used to make a prediction for another observable, namely, the beam charge-spin 
asymmetry in the kinematic region to be explored by the COMPASS II experiment [ [29] , 
In Section |4] we place particular emphasis on the determination of uncertainties. To do so 
we fit a simple model using the standard least-squares method and compare the resulting 
uncertainties with those obtained by neural networks. Section |^ contains conclusions. 

2 Fitting data with neural networks 

Neural networks have been applied to a variety of tasks, including optimization problems. 
One of the main attractions is their apparent ability to mimic the behavior of human 
brain: with their multi-connectedness and parallel processing of information they can be 
trained to perform relatively complex classification and pattern recognition tasks. To this 
end many kinds of neural networks and corresponding "learning" algorithms have been 
developed, see, e.g., p0| . In our study, we used one of the most popular neural network 
types, known as multilayer perceptron. Schematically shown on Fig. [3], it is a mathematical 
structure consisting of a number of interconnected "neurons" organized in several layers. 
Each neuron has several inputs and one output. The value at the output is given as a 
function f(J2j w j x j) °f a sum °f values at inputs Xi, x 2 , ■ ■ ■ , each weighted by a a certain 
number Wj. Consequently, a multilayer perceptron is defined by its architecture (number of 
layers and number of neurons in each layer), by the values of its weights, and by the shape 
of its activation function f(x). Again somewhat inspired by biological neuron properties, 
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Figure 1: A neural network (multilayer perceptron with 2-5-2 architecture) parametrization 
of the complex- valued CFF ~H(xB,t). Each blob symbolizes a neuron and thickness of arrows 
represents the strengths of weights Wj. (Smaller blobs with no input are so-called biases - 
they improve the network's properties IpOfl .) 

the activation function is often taken as a nonlinear function with saturating properties 
for large and small values of its argument. We employed a very popular logistic sigmoid 
function 

/(x) = l/(l + exp(-x)) 

for neurons in inner ("hidden") layer (s), while for input and output layers we used the 
identity functionF] f(x) = x. By iterating over the following steps the network is then 
trained, i.e., it "learns" how to describe a certain set of data points: 

1. Kinematic values (two in our case: Xb and t) of the first data point are presented to 
two input-layer neurons 

2. These values are then propagated through the network according to weights and 
activation functions. In the first iteration, weights are set to some random value. 

3. As a result, the network produces some resulting values of output in its output-layer 
neurons. Here we have two: JmH and d\zl-L. Obviously, after the first iteration, 
these will be some random functions of the input kinematic values: Jm?/(xB,t) and 

2 To increase representational power of a given network, one can also use sigmoid function in input 
and output layers, but then input and output values have to be rescaled so that they don't fall into 
the saturation regions of the activation function. For our relatively simple case the identity function is 
sufficient. 



4 



4. Using these values of CFF(s), the observable(s) corresponding to the first data point 
is (are) calculated and it is (they are) compared to actually measured value(s), with 
squared error used for building the standard x 2 function. 

5. The obtained error is then used to modify the network: It is, possibly weighted 
by the inverse uncertainty of the experimental measurement, propagated backwards 
through the layers of the network and each weight is adjusted such that this error is 
decreased. The concrete algorithm for this weight adjustment is discussed at the end 
of this section. 

6. This procedure is then repeated with the next data point, until the whole training 
set is exhausted. 

This sequence of steps (called training epoch) is repeated until the network is capable to 
describe experimental data with a sufficient accuracy — the precise stopping criterion is 
specified below. 

From what is said it should be clear that a neural network is nothing more but a com- 
plicated non-linear multi-parameter function and that the training procedure is equivalent 
to the least-squares fitting of this function to data. The actual power of the approach 
stems from the following: 

• A neural network serves as unbiased interpolating function. Its complicated depen- 
dence on parameters enables it to approximate any smooth function with comparable 
easeQ. 

• Updating the weights of a given neuron by back-propagation of error uses only values 
and gradients of activation functions in the immediate network neighborhood. In this 
sense the training process is local and it is algorithmically efficient. 

• This approach enables the following convenient method for propagation of experimen- 
tal uncertainties (and even their correlations) into the final result: In our application 
a "replica data set" is interpolated. It is obtained from original data by generating 
random artificial data points using Gaussian probability distribution with a width 
defined by the error bar of experimental measurements. Taking a large number N rep 
of such replicas, the resulting family of trained neural networks . . . , %^ Nre ^ de- 
fines a probability distribution of the represented CFF H(xb, t) and of any functional 
T\H\ thereof. Thus, the mean value of such a functional and its variance are 18 1 

Nrep 

(^]) = ^E^ (fc) ]> ( 4 ) 

rep k=X 

(mnf ' = {nn 2 ) - {nn) 2 • (s) 



3 This flexibility of a neural network is formalized in the universal approximation theorem fl30| . Roughly 
spoken, this theorem states that any continuous multivariate function f(x\, X2, ■ • ■ ) can be arbitrary accu- 
rately approximated by a multilayer perceptron. Strictly spoken, one hidden layer is enough for this, but 
networks with more hidden layers can have fewer neurons and/or can be more efficiently trained. 
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FIGURE 2: Typical progress of neural network training: the error for the description of 
training data (dashed) is monotonously decreasing, while the error for the validation data 
(solid) increases when the neural network is overfitted. In the plotted example one would 
use the network obtained after 500-700 training epochs as the final result 



For details of this "Monte Carlo" procedure see Refs. fig , |32|| and note that this 
is a general method of error propagation, which is not related to neural networks 
themselves and can be used also for error propagation in standard least-squares fitting 
of model functions to data. 

The power of neural networks to approximate any continuous function implies also the 
danger of overfitting the data (also known as overtraining) . Namely, after a certain number 
of iterations the network will not only describe the general dependence of observables on 
kinematic variables, but will also adjust to the random fluctuations of data. This unwanted 
behavior is prevented by the cross-validation procedure in which initial data is divided into 
two sets: training set and validation set. Then performance of the network is continuously 
checked on the validation set. Since this set is not used for training, after the onset 
of overfitting, the error of the network's description of validation sets will increase, see 
Fig. |^. This is the moment at which training is stopped. 

For more informations on neural networks we refer the reader to the vast literature 
on the subject, e.g. fl30 |, while the PhD thesis [32 is a good reference for the particular 



approach used also in this work. In the remainder of this section, we specify some details 
about the neural networks we used. 

In our study we employed the PyBrain software library for neural networks ||33|| . This 
choice was motivated by (i) the great flexibility of this library, which allows to explore 
various types of neural networks, and (ii) it being written in the Python programming 
language, such that it was easy to link with our already existing Python code for DVCS 
analysis. Like almost every other freely available neural network software (at least to our 

6 



knowledge), PyBrain expects that the network output is directly compared to training 
data samples for error calculation. In our case, however, the network output must be 
transformed, i.e., the DVCS observables must be calculated from JmH and EHe'H (see step 
no. 4 in the training procedure described above), before the error can be evaluated. For 
this, we had to make some modifications to the PyBrain software. 

Concerning the network architecture, the number of neurons in the input and output 
layers were fixed by the problem at hand to be two, see Fig. |l[ To fix the number of 
hidden layers and their neurons, one has to consider a trade-off between two requirements: 
the number of neurons in the hidden layer(s) must be large enough to ensure sufficient 
expressive power of the network but must not be too large with respect to the number 
of available data points, otherwise the training will become difficult. In our case, it is 
already known from previous studies, e.g. 0, that CFF % is a reasonably well-behaved 
function. Thus, since we used a relatively small sample of few tens of data points, see next 
section, just one hidden layer with about ten neurons proved to be sufficient. The specific 
results presented in this paper were obtained by training series of neural networks, with 
13 neurons in the hidden layer each. We checked that results do not change appreciably if 
a neuron is added or removed from the hidden layer, which is one of the standard tests in 
neural network applications. 

There are various training algorithms available for updating the network weights, from 
a simple back-propagation algorithm [3(| to genetic algorithms like the one used by the 
NNPDF group [[32], |I4" |. After some experimentation we ended up using the so called 



resilient back-propagation algorithm [34]. This algorithm is a modification of the usual 
back-propagation algorithm: only the signs of the partial derivatives of the error function, 
and not their magnitude are taken into account. Resilient back-propagation is known to be 
very reliable, fast and insensitive to parameters of the network and of the learning process. 
More sophisticated algorithms are needed when correlations are included in the y 2 error 
function or in cases for which the connection between the network output and the experi- 
mental observables is very non-linear. The latter would, e.g. be the case when the network 
output would represent GPDs rather than CFFs, so that the connection would involve 
convolutions with QCD evolution operators and hard-scattering coefficient functions. Let 
us finally note that besides perceptrons, there are attempts to apply other kinds of neural 
networks in nucleon structure studies, such as self-organizing maps [B5| . 



3 Example fit to data 

For the first application, to assess the power of the neural network approach, we took just 
two sets of measurements of leptoproduction of a real photon by scattering leptons off 
unpolarized protons. One set consisted of 18 measurements of the first sine harmonic A s ^f 
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of the beam spin asymmetry (BSA)[] 



BSA = ^ — „ A ™* sin ; ( 6 ) 
dcr e t + dcr e 4- 

(where <ft is the azimuthal angle in the so-called Trento convention), while in the other 
set there were 18 measurements of the first cosine harmonic A C ^ S( ^ of the beam charge 
asymmetry (BCA) 



BCA = d ° e+ d ° e ~ ~ A™ °* + A™ * cos . (7) 
da e + + da e 6 6 



Both sets cover identical kinematic regions 

0.05 < x B < 0.24, 0.02 < -t/GeV 2 < 0.46 , and 1.2 < Q 2 /GeV 2 < 6.11 . 

As mentioned in the Introduction, we assumed that QCD evolution effects can be neglected, 
i.e., that CFFs are independent of Q 2 . Furthermore, it has been shown that the hypothesis 
of CFF % dominance leads to a successful description of this data and so we relied on 
this assumption (with the understanding that it is only an approximation which should be 
given up, once sufficiently precise data is available). Thus, at present, just a single CFF 
%{xB,t), or two real-valued functions JmH(xB,t) and £He "H(xb, t), are extracted from 
data by our neural networks. 

A comment about the relation between 3m TL and 9ie % is in order. Analytic properties 
of CFF ~H relate these two functions via a "dispersion relation" , which reads in twist-two 
approximation: 

XzH(x' B = -^-,t) = PV- dx- -3 m n{x B = —— 1 t)-C{t) 1 (8) 

1 + q 7T Jq ^ — X 1 + X 

cf. Eqs. (H]) and (|]). As discussed in the Introduction, this could be used to simplify the 
fitting function set from 

{3mH(x B ,t),tR*H(xB,t)} to {3mH(x B ,t),C(t)}. 

However, for two reasons we treat here the imaginary and real part of CFF "H as inde- 
pendent quantities. First, assumption of "dispersion relation" (|j) by itself introduces a 
theoretical bias we want to avoid. In Section § we will see that model-dependent least- 
squares fits that are based on the "dispersion relation" (|8|) link, e.g., assumptions about 
the low x behaviour of 3m H to that of 9^e % which might lead to rather misleading results. 
Instead, neural networks which do not make use of (|J) give large uncertainties for both, 
which is probably more realistic. 



4 Actually, what was measured is the so-called charge-difference BSA, see |28| for definition, but in the 
leading twist approximation this is equal to the simple BSA (^) and we disregard the difference of these 
two observables in this paper. 
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Figure 3: First cosine harmonic of beam charge asymmetry A c ^ s< ^ (|7]) and first sine har- 
monic of beam spin asymmetry A s £^ @ resulting from neural network fit (hatched areas), 
shown together with data [28], used for training. Two model fits, KM09a (solid) and KM09b 
(dashed) from || are also shown for comparison. 



Second, to implement the "dispersion relation" approach in a neural network framework 
one would need to either (i) disconnect topologically the network output representing C(t) 
from the input representing xb, or (ii) to create a separate neural network with one input 
and one output (1-1) for C(t) and train it together with the two-inputs-one-output (2-1) 
network representing 3ml-L{xBi t). Certainly, taking in future all available fixed target data 
into account, such a strategy might be feasible, and if so, would allow to get rid of the 
Ti dominance hypothesis and might thus improve the phenomenological value of the final 
result. 

Using the procedure described in Section |2|, we created 50 replicas from the set of 36 
measured data points, which were randomly divided into a training set of 25 points and 
a validation set of 11 points. Then we trained one neural network on each training set, 
monitoring progress on the validation set, as in Fig. ||. Training was performed either 
until the onset of overlearning, or for 2000 epochs, whichever came first. We ended up 
with 50 neural networks, each representing CFF T-L. Most of the neural networks fit the 
original data set quite well (44 of our 50 neural networks have x 2 less than the number of 
data points, 36), with few poorer fits as expected due to the stochastic nature of the set 
of data replicas. 

Using this set of neural networks as a probability distribution in a functional space 
of all % one can predict any function of "H, together with its uncertainty, using (f|) and 
(H). As a consistency check we plot in Fig. || the result for the very observables that are 
used for training of networks. We observe that uncertainties of measurements have been 
correctly propagated into the corresponding uncertainties given by neural networks. This 
is particularly visible in the first two panels of Fig. ^ where uncertainties of both data 
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Figure 4: Predictions of neural network fits for beam charge-spin asymmetry (g) in COM- 
PASS II kinematics {E^ = 160 GeV, = 0) (hatched areas). Two model fits, KM09a (solid) 
and KM09b (dashed) from || are also shown for comparison. 

and neural networks increase with \t\ in the same fashion. 

Error bands here and in other figures denote the uncertainty that corresponds to one 
standard deviation (one sigma). We checked for departures from a Gaussian distribution 
of network results and found them to be small in the region where data is available. This 
means that a one-sigma error band corresponds indeed to a 68 % confidence level. However, 
as observed also by the NNPDF group [f[4}| , in the extrapolation region, where there is 



no data, departures from Gaussian distribution are significant, and 68% confidence level 
regions are generally smaller than the one-sigma regions we plot. 

As an example of a proper prediction coming from our analysis we plot in FlG. [| the 
beam charge-spin asymmetry (BCSA), 

BCSA = p± ~ p*Z (9) 
dcr M i+ + delt- 
as a function of t, for several kinematic points that are characteristic for the COMPASS 
II experiment (where the muon is taken to be massless and the polarization is set equal 
to 0.8). This experiment was chosen because its kinematics overlaps with that of the 
HERMES data used for neural network training. Hence these predictions represent partly 
interpolation and partly extrapolation of HERMES data, thus testing this framework in a 
nontrivial way. 
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4 Comparison with model fits and assessment of un- 
certainties 



One of the main advantages of the neural network approach to hadron structure is that it 
offers a convenient way to assess the uncertainties of non-perturbative functions (GPDs, 
CFFs, PDFs, . . . ). Let us now emphasize this point further by making a comparison with 
the common method in which one chooses some model and makes a least-squares fit of it 
to the data. To quantify this comparison, we took a simple model of the CFF % and fitted 
it to the very same data to which neural networks were trained. 

We adopted a version of the model used in I5J for which the partonic decomposition of 
3m % is: 



2fmH(xBj,t) = vr 





\( 4 




7T 


2- + 


i) 




A 9 





2- + -)F val (e,e,i) + g^ sea (e,e^) 

We parametrized the GPDs along the cross-over trajectory £ = x as: 

- a {t) 



H(x, x, t) 



nr 



2x 



1 + x V 1 + x 



X 



1 



X 



(1 



l+x M' 



(10) 



(11] 



Here 2~ a ^ n is the normalization of PDF q(x) = H(x,0,0) from PDF fits, r is the skewness 
ratio at small x (the ratio of a GPD at some point on the cross-over trajectory and the 
corresponding PDF), a(t) is the "Regge trajectory", b controls the large-x behavior, and 
M and p control the t-dependence [3(J. The parameters of the sea-quark GPD H sea were 

ea (t) = 1.13 + 0.15t/GeV 2 , n sea = 1.35, r sea = 1, 6 sea = 2, 
= 2). For the valence quark GPD if val , we also fixed a val (t) = 
1.35, and p val = 1. This left r val , 6 val and M val as free parameters, 



taken to be as in 
(M sea ) 2 = 0.5 GeV 2 



[a 

sea . 



0.43 + 0.85 t/GeV 2 

to be determined by the fit to the data. We obtained D\z TL from the "dispersion relation" 
(§), treating the subtraction constant C as a fourth and final free parameter of the model^. 

Fitting this model to the same 36 data points used for neural network fit, using the 
well-known Minuit software package |37j for minimization of the x 2 function, resulted in 
a good fit (xQ/n.d.o.f = 22.2/32) with parameter values 



„val 



1.11 



' val 



1.79, M 



val 



0.51 GeV, C = 2.25 



(12) 



The uncertainty of the resulting fit is commonly determined by the so-called Hessian 
method. The Hessian matrix is given by the second derivatives of \ 2 with respect to 
the model parameters at the minimum x = Xo : 



(9V 



ddiddj 



(13) 



x=xo 



5 We assume the subtraction constant in (g) to be t- independent, C(t) = C. This is motivated a posteriori 
by the fact that such a simple model is good enough to fit the data. Additionally, when we parametrize 
C(t) = 1/(1 — t/M^) 2 , the parameter Mc tends to be strongly correlated with other parameters of the 
model, making error analysis, which is our main task, more difficult. 
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The uncertainty Af of the function /(a«) of these parameters, including the uncertainty 
of the parameters themselves, is given by the error propagation formula 



(^)' = ^E^vg- (") 

ij 

In a textbook approach to statistics, one sigma uncertainty is obtained by taking A% 2 = 1 
in this formula. However, there are several problems with this simple procedure. First, 
X 2 tends to vary very differently in different directions of parameter space, yielding a 
Hessian with large variations in the eigenvalue spectrum which might then imply numerical 
difficulties. In our case, where eigenvalues vary over three orders of magnitude, the problem 
was not so severe. However, in DVCS fits, involving more parameters, we have also observed 
much stronger variations. To improve the reliability of the Hessian method, we followed 



an iterative procedure |38] which allows to find natural directions in parameter space and 
natural step sizes for the finite-difference calculation of the Hessian matrix. Second, it is 
known that with global fits, i.e., when one combines data from several different experiments, 
errors are not distributed in the expected way. Often, reasoning strictly within textbook 
statistics, one would have to reject some data as incompatible either with itself (i.e., because 
its x 2 is too large) or with other data sets (i.e., because likelihood curves poorly overlap). 
This is mainly due to systematic errors which are not understood. Without entering further 



into the discussion of this complex issue, we can follow the CTEQ procedure [39], where 



Ax in ( H ) is increased ad hoc to Ax = T in order to accommodate all experimental 



measurements used for fitting. The tolerance parameter T is determined by calculating 
the average distance from the best fit along eigenvectors of the Hessian matrix at which 
the model is still compatible with some particular experiment at some confidence level 
(C.L.). In global PDF fits, with many data sets, values of T ~ 5 — 10 are common, see 
e.g. |4(| for a recent review. Nevertheless, using this procedure, we arrived at a tolerance 
T « 1 (for 68 % C.L.; CTEQ use 90 %). We conclude that in our case with just two sets of 
data, coming from the same experiment, there are no statistical inconsistencies and we can 
actually use the textbook formula ([14]) with A% 2 = 1, which we did^. We note in passing 
that we found that model parameters are significantly more constrained by the BCA than 
by the BSA data. 

We can now compare the model fit ([T2p, taking also into account uncertainties calculated 
by the procedure just described, with the neural network parameterization of the CFF 
%. Both findings are displayed in Fig. ||, together with two models from ||, where 
3m "H (upper panels) and DltH (lower panels) are separately plotted. We notice several 
interesting features: 

• In the kinematic region of measured data (this is roughly the middle vertical third 
of the left panels), there is a good agreement of values and uncertainties of neural 
networks (ascending hatches) and model fit (descending hatches). This shows that 



Preliminary analysis taking all available data for DVCS on unpolarized target suggests that the toler- 
ance T will have to be increased for a true global fit. 
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i=x B /(2-x B ) £=x B /(2-x B ) 

Figure 5: Neural network extraction of 3m / H(xB),t) and D\eT~L(xB),t) (ascending hatches) 
from HERMES BCA and BSA |28| data compared with three model fits, one of which with 
determined uncertainties (descending hatches). 



both fitting methods correctly interpolate the data, and that two, quite different, 
statistical methods used for determination of uncertainties are mutually consistent. 

• As one starts to extrapolate the fitted CFF 1-i outside of the data region (left and 
right thirds of left panels and the whole of the right panels), the two methods predict 
markedly different shapes and uncertainties for the CFF H. All model fits show effects 
of theoretical bias by following the x~ a functional behavior at small x and claiming 
a small uncertainty there, whereas in neural network parameterizations it is obvious 
that extrapolated functions are very unconstrained. This is particularly visible in the 
right panels, illustrating the difficulty of a model-independent extrapolation towards 
t = 0, which is a limit of particular interest for hadron structure studies. 

• The model fit and the corresponding uncertainties (descending hatches) show the 
effect of the dispersion relation integral constraint (Q) — JmH is significantly con- 
strained also outside of the data region. This is actually a welcome feature of the 
dispersion relation fits, offering the opportunity to access nucleon structure in re- 
gions not directly accessible in the experiment. However, its usefulness depends on 
the extent to which the functional forms used are firmly established. 
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• The model KM09b (dashed line), which was obtained in 11 by including additional 
data, and, more importantly, by extending the model to include contributions of the 
CFFs T-L and S, disagrees for tyKzl-L with the other three fits. This emphasizes that 
the assumed H dominance might substantially affect the quality of our results. 

5 Conclusions and outlook 

Based on HERMES measurements of lepton scattering off unpolarized protons, a reaction 
which should be dominated by the CFF T-L, we performed the first neural network analysis 
of deeply virtual Compton scattering data and obtained a neural network representation of 
the CFF "H. This was then used to make a prediction for the beam charge-spin asymmetry, 
measurable at COMPASS II. The Monte Carlo method of propagation of experimental 
errors enabled us to determine also the uncertainty of the resulting CFF, and we have found 
that (in the kinematic region where data is available) it agrees well with the uncertainty 
determined by model fits and standard statistical procedures. However, neural networks 
combined with Monte Carlo error propagation are a conceptually cleaner and practically 
simpler method. The propagation of uncertainty from experimental data is intrinsic. Most 
importantly, this provides essentially a model-independent approach to determine CFFs 
and GPDs. 

One could argue that the neural network approach, just because it is essentially model- 
independent, does nothing more than faithfully representing the information available in 
the experimental data by universal objects: CFFs (as presented here) or GPDs (yet to 
be extracted by this approach). This is valuable in itself but can also be considered 
as a stepping stone towards improved model-dependent studies, that in principle offer 
the advantage that our detailed theoretical and phenomenological understanding of QCD 
dynamics can be taken into account. Such a two-step procedure would be quite natural as 
models can be more directly compared to neural-network determined CFFs or GPDs than 
to actual measured observables. 

Finally, let us stress that we consider the presented neural network study only as a first 
step in the analysis of DVCS data. The problem here is that the number of observables 
measured at a given kinematic point is in practice not sufficient to pin down all twelve 
CFFs, some of which are kinematically suppressed. 
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